Generation for A ne Subscripts inData - Parallel
نویسندگان
چکیده
This paper presents an eecient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with aane subscripts in data-parallel programs. For the memory accesses of an array reference with aane subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k + s 2), where s 2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. For the repetitive pattern of outer loop, Moreover, we also discover that the repetitive iterations for outer loop are Pk= gcd(P k; s 1) instead of Pk, where s 1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with aane subscript in a two-nested loop is O(Pk gcd(P k;s1) + k + s 2).
منابع مشابه
Code Generation for Complex Subscripts in Data-Parallel Programs
Data parallel languages like High Performance Fortran, demand efficient compile and run-time techniques for tasks such as address generation. Array references with arbitrary affine subscripts can make the task of compilers for such languages highly involved. This paper deals with the efficient address generation in programs with array references having two types of commonly encountered affine r...
متن کاملParallel Generation of t-ary Trees
A parallel algorithm for generating t-ary tree sequences in reverse B-order is presented. The algorithm generates t-ary trees by 0-1 sequences, and each 0-1 sequences is generated in constant average time O(1). The algorithm is executed on a CREW SM SIMD model, and is adaptive and cost-optimal. Prior to the discussion of the parallel algorithm a new sequential generation with O(1) average time ...
متن کاملAn Efficient Algorithm for Workspace Generation of Delta Robot
Dimensional synthesis of a parallel robot may be the initial stage of its design process, which is usually carried out based on a required workspace. Since optimization of the links lengths of the robot for the workspace is usually done, the workspace computation process must be run numerous times. Hence, importance of the efficiency of the algorithm and the CPU time of the workspace computatio...
متن کاملMulti-dimensional Interval Test
In this paper, we propose a sophisticated technique of data dependence analysis for distributed memory parallel environments that is used for converting sequential code into a parallel form targeted for a particular architecture. Two-dimensional arrays with subscripts formed by induction variable in real programs appear quite frequently [10]. We test if there are integervalued solutions for two...
متن کاملGeneration of Data ow Fine - grain Parallel Data - structures on a Distributed - memory Computer
Data ow-based ne-grain parallel data-structures provide high-level abstraction to easily write programs with potentially high parallelism. In order to show the feasibility of a ne-grain data ow paradigm, we are now implementing a non-strict data ow language on o -the-shelf computers, including a distributedmemory parallel machine. The results of preliminary experiments indicate that the ine cie...
متن کامل